Search CORE

568 research outputs found

Deep Reinforcement Learning in Surgical Robotics: Enhancing the Automation Level

Author: Qian Cheng
Ren Hongliang
Publication venue
Publication date: 01/09/2023
Field of study

Surgical robotics is a rapidly evolving field that is transforming the landscape of surgeries. Surgical robots have been shown to enhance precision, minimize invasiveness, and alleviate surgeon fatigue. One promising area of research in surgical robotics is the use of reinforcement learning to enhance the automation level. Reinforcement learning is a type of machine learning that involves training an agent to make decisions based on rewards and punishments. This literature review aims to comprehensively analyze existing research on reinforcement learning in surgical robotics. The review identified various applications of reinforcement learning in surgical robotics, including pre-operative, intra-body, and percutaneous procedures, listed the typical studies, and compared their methodologies and results. The findings show that reinforcement learning has great potential to improve the autonomy of surgical robots. Reinforcement learning can teach robots to perform complex surgical tasks, such as suturing and tissue manipulation. It can also improve the accuracy and precision of surgical robots, making them more effective at performing surgeries

arXiv.org e-Print Archive

Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery

Author: Bai Long
Islam Mobarakol
Ren Hongliang
Publication venue
Publication date: 11/07/2023
Field of study

Medical students and junior surgeons often rely on senior surgeons and specialists to answer their questions when learning surgery. However, experts are often busy with clinical and academic work, and have little time to give guidance. Meanwhile, existing deep learning (DL)-based surgical Visual Question Answering (VQA) systems can only provide simple answers without the location of the answers. In addition, vision-language (ViL) embedding is still a less explored research in these kinds of tasks. Therefore, a surgical Visual Question Localized-Answering (VQLA) system would be helpful for medical students and junior surgeons to learn and understand from recorded surgical videos. We propose an end-to-end Transformer with Co-Attention gaTed Vision-Language (CAT-ViL) for VQLA in surgical scenarios, which does not require feature extraction through detection models. The CAT-ViL embedding module is designed to fuse heterogeneous features from visual and textual sources. The fused embedding will feed a standard Data-Efficient Image Transformer (DeiT) module, before the parallel classifier and detector for joint prediction. We conduct the experimental validation on public surgical videos from MICCAI EndoVis Challenge 2017 and 2018. The experimental results highlight the superior performance and robustness of our proposed model compared to the state-of-the-art approaches. Ablation studies further prove the outstanding performance of all the proposed components. The proposed method provides a promising solution for surgical scene understanding, and opens up a primary step in the Artificial Intelligence (AI)-based VQLA system for surgical training. Our code is publicly available.Comment: To appear in MICCAI 2023. Code availability: https://github.com/longbai1006/CAT-Vi

arXiv.org e-Print Archive

Revisiting Distillation for Continual Learning on Visual Question Localized-Answering in Robotic Surgery

Author: Bai Long
Islam Mobarakol
Ren Hongliang
Publication venue
Publication date: 22/07/2023
Field of study

The visual-question localized-answering (VQLA) system can serve as a knowledgeable assistant in surgical education. Except for providing text-based answers, the VQLA system can highlight the interested region for better surgical scene understanding. However, deep neural networks (DNNs) suffer from catastrophic forgetting when learning new knowledge. Specifically, when DNNs learn on incremental classes or tasks, their performance on old tasks drops dramatically. Furthermore, due to medical data privacy and licensing issues, it is often difficult to access old data when updating continual learning (CL) models. Therefore, we develop a non-exemplar continual surgical VQLA framework, to explore and balance the rigidity-plasticity trade-off of DNNs in a sequential learning paradigm. We revisit the distillation loss in CL tasks, and propose rigidity-plasticity-aware distillation (RP-Dist) and self-calibrated heterogeneous distillation (SH-Dist) to preserve the old knowledge. The weight aligning (WA) technique is also integrated to adjust the weight bias between old and new tasks. We further establish a CL framework on three public surgical datasets in the context of surgical settings that consist of overlapping classes between old and new surgical VQLA tasks. With extensive experiments, we demonstrate that our proposed method excellently reconciles learning and forgetting on the continual surgical VQLA over conventional CL methods. Our code is publicly accessible.Comment: To appear in MICCAI 2023. Code availability: https://github.com/longbai1006/CS-VQL

arXiv.org e-Print Archive

Sim-to-Real Segmentation in Robot-assisted Transoral Tracheal Intubation

Author: Bai Long
Lai Jiewen
Ren Hongliang
Ren Tian-Ao
Wang Guankun
Publication venue
Publication date: 19/05/2023
Field of study

Robotic-assisted tracheal intubation requires the robot to distinguish anatomical features like an experienced physician using deep-learning techniques. However, real datasets of oropharyngeal organs are limited due to patient privacy issues, making it challenging to train deep-learning models for accurate image segmentation. We hereby consider generating a new data modality through a virtual environment to assist the training process. Specifically, this work introduces a virtual dataset generated by the Simulation Open Framework Architecture (SOFA) framework to overcome the limited availability of actual endoscopic images. We also propose a domain adaptive Sim-to-Real method for oropharyngeal organ image segmentation, which employs an image blending strategy called IoU-Ranking Blend (IRB) and style-transfer techniques to address discrepancies between datasets. Experimental results demonstrate the superior performance of the proposed approach with domain adaptive models, improving segmentation accuracy and training stability. In the practical application, the trained segmentation model holds great promise for robot-assisted intubation surgery and intelligent surgical navigation.Comment: Extended abstract in IEEE ICRA 2023 Workshop (New Evolutions in Surgical Robotics: Embracing Multimodal Imaging Guidance, Intelligence, and Bio-inspired Mechanisms

arXiv.org e-Print Archive

Domain Adaptive Sim-to-Real Segmentation of Oropharyngeal Organs

Author: Bai Long
Lai Jiewen
Ren Hongliang
Ren Tian-Ao
Wang Guankun
Publication venue
Publication date: 27/07/2023
Field of study

Video-assisted transoral tracheal intubation (TI) necessitates using an endoscope that helps the physician insert a tracheal tube into the glottis instead of the esophagus. The growing trend of robotic-assisted TI would require a medical robot to distinguish anatomical features like an experienced physician which can be imitated by utilizing supervised deep-learning techniques. However, the real datasets of oropharyngeal organs are often inaccessible due to limited open-source data and patient privacy. In this work, we propose a domain adaptive Sim-to-Real framework called IoU-Ranking Blend-ArtFlow (IRB-AF) for image segmentation of oropharyngeal organs. The framework includes an image blending strategy called IoU-Ranking Blend (IRB) and style-transfer method ArtFlow. Here, IRB alleviates the problem of poor segmentation performance caused by significant datasets domain differences; while ArtFlow is introduced to reduce the discrepancies between datasets further. A virtual oropharynx image dataset generated by the SOFA framework is used as the learning subject for semantic segmentation to deal with the limited availability of actual endoscopic images. We adapted IRB-AF with the state-of-the-art domain adaptive segmentation models. The results demonstrate the superior performance of our approach in further improving the segmentation accuracy and training stability.Comment: The manuscript is accepted by Medical & Biological Engineering & Computing. Code and dataset: https://github.com/gkw0010/EISOST-Sim2Real-Dataset-Releas

arXiv.org e-Print Archive

Generalizing Surgical Instruments Segmentation to Unseen Domains with One-to-Many Synthesis

Author: Islam Mobarakol
Ren Hongliang
Wang An
Xu Mengya
Publication venue
Publication date: 28/06/2023
Field of study

Despite their impressive performance in various surgical scene understanding tasks, deep learning-based methods are frequently hindered from deploying to real-world surgical applications for various causes. Particularly, data collection, annotation, and domain shift in-between sites and patients are the most common obstacles. In this work, we mitigate data-related issues by efficiently leveraging minimal source images to generate synthetic surgical instrument segmentation datasets and achieve outstanding generalization performance on unseen real domains. Specifically, in our framework, only one background tissue image and at most three images of each foreground instrument are taken as the seed images. These source images are extensively transformed and employed to build up the foreground and background image pools, from which randomly sampled tissue and instrument images are composed with multiple blending techniques to generate new surgical scene images. Besides, we introduce hybrid training-time augmentations to diversify the training data further. Extensive evaluation on three real-world datasets, i.e., Endo2017, Endo2018, and RoboTool, demonstrates that our one-to-many synthetic surgical instruments datasets generation and segmentation framework can achieve encouraging performance compared with training with real data. Notably, on the RoboTool dataset, where a more significant domain gap exists, our framework shows its superiority of generalization by a considerable margin. We expect that our inspiring results will attract research attention to improving model generalization with data synthesizing.Comment: First two authors contributed equally. Accepted by IROS202

arXiv.org e-Print Archive

The Quantitative Diagnosis Method of Rubbing Rotor System

Author: Bangchun Wen
Hongliang Yao
Qi Xu
Zhaohui Ren
Publication venue: Vibroengineering
Publication date: 20/11/2013
Field of study

The dynamics of the rubbing rotor system is analyzed by applying harmonic balance method. The relationship between harmonic components in the response of the rubbing rotor system and the dynamic stiffness matrix of the fault free rotor system is revealed, based on which a new model based method for rubbing identification is presented. By applying this method, the fault location and rubbing forces of the single rubbing rotor system can be identified by using vibration data of only two nodes, the rubbing locations and rubbing forces of the double rubbing rotor system can be identified by using vibration data of three nodes. The numerical simulations and experiments on the rotor test-rig are carried out to verify the efficiency of the present method

Curriculum-Based Augmented Fourier Domain Adaptation for Robust Medical Image Segmentation

Author: Islam Mobarakol
Ren Hongliang
Wang An
Xu Mengya
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/07/2023
Field of study

Accurate and robust medical image segmentation is fundamental and crucial for enhancing the autonomy of computer-aided diagnosis and intervention systems. Medical data collection normally involves different scanners, protocols, and populations, making domain adaptation (DA) a highly demanding research field to alleviate model degradation in the deployment site. To preserve the model performance across multiple testing domains, this work proposes the Curriculum-based Augmented Fourier Domain Adaptation (Curri-AFDA) for robust medical image segmentation. In particular, our curriculum learning strategy is based on the causal relationship of a model under different levels of data shift in the deployment phase, where the higher the shift is, the harder to recognize the variance. Considering this, we progressively introduce more amplitude information from the target domain to the source domain in the frequency space during the curriculum-style training to smoothly schedule the semantic knowledge transfer in an easier-to-harder manner. Besides, we incorporate the training-time chained augmentation mixing to help expand the data distributions while preserving the domain-invariant semantics, which is beneficial for the acquired model to be more robust and generalize better to unseen domains. Extensive experiments on two segmentation tasks of Retina and Nuclei collected from multiple sites and scanners suggest that our proposed method yields superior adaptation and generalization performance. Meanwhile, our approach proves to be more robust under various corruption types and increasing severity levels. In addition, we show our method is also beneficial in the domain-adaptive classification task with skin lesion datasets. The code is available at https://github.com/lofrienger/Curri-AFDA. Note to Practitioners —Medical image segmentation is key to improving computer-assisted diagnosis and intervention autonomy. However, due to domain gaps between different medical sites, deep learning-based segmentation models frequently encounter performance degradation when deployed in a novel domain. Moreover, model robustness is also highly expected to mitigate the effects of data corruption. Considering all these demanding yet practical needs to automate medical applications and benefit healthcare, we propose the Curriculum-based Fourier Domain Adaptation (Curri-AFDA) for medical image segmentation. Extensive experiments on two segmentation tasks with cross-domain datasets show the consistent superiority of our method regarding adaptation and generalization on multiple testing domains and robustness against synthetic corrupted data. Besides, our approach is independent of image modalities because its efficacy does not rely on modality-specific characteristics. In addition, we demonstrate the benefit of our method for image classification besides segmentation in the ablation study. Therefore, our method can potentially be applied in many medical applications and yield improved performance. Future works may be extended by exploring the integration of curriculum learning regime with Fourier domain amplitude fusion in the testing time rather than in the training time like this work and most other existing domain adaptation works

UCL Discovery